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[0034] The present invention is directed to a method and apparatus for implementing a 
recent entry queue which complements a Branch Target Buffer (BTB) BTB tab l o 100 as 
shown generally in FIG. 1. F I G. 1 shows an examp l e of a BTB tab l o rocont ontry 
quouo w i th comparo l og i c 100. I n F I G. 1 a sot of data li nos 1 1 0 supp l y data to a f i rst sot of 
rocont ontry quouo reg i sters 120 for a f i rst ontry and to a socond sot of rocont ontry quouo 



address and va li d data. Tho data i n tho f i rst sot of rog i stors 120 i s comparod by a f i rst 
comparo equa l comparator 130 w i th tho data bo i ng supp li ed by data i n li nos 110. Tho data 
i n tho socond sot of rog i stors 121 i s comparod by a socond comparo oqua l comparator 131 
w i th tho data bo i ng supp li ed by data i n li nos 110. Whon tho comparo oqua l comparators 
dotoct oqua li ty an output i o supp li ed from o i thor or both of thorn to tho OR gato 140 wh i ch 
supp li es a b l ock wr i to s i gna l on li no 150. Through the usage of a BTB tatete recent entry 
queue w i th comparo l og i c 400, as shown in FIG. 4, three benefits are acquired as 
follows: 

1 ) Removal of majority of scenarios that can cause duplicate entries in a the BTB^ tab l o 100 
of F I G. 1 , BTB tab l o 610 i n F I G. 6. 

2) The ability to semi-synchronize the asynchronous interface between branch prediction 
and decode m§ when the latency of the branch detection via the BTB table initially places 
the BTB tabte behind the decode m% of a pipeline when the pipeline is starting up from a 
cold start or after a branch wh i ch was wrong. 

3) For frequently accessed branches, the ability to access them in fewer cycles thereby 
improving the throughput of the branch prediction logic which in turn improves the overall 
throughput of the given microprocessor pipeline 300 of FIG. 3, with th# decode m§ 
stago 31 0, cyc l o cache address calculation otago 320, cache access stage 330, 
register access stago 340, execute and branch resolution stago 350, and register 
writeback stage- 360. 
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[0036] When a branch is not predicted, a surprise branch 710, (FIG. 7) may be 
encountered, and it is to be written into the BTB iabte 1 00 in FIG. 1 , and tho BTB 
tabte 610 in FIG. 6 such that it can be predicted in respect to the next occurrence, 
upon branch resolution stago 350 of tkat branch at the execute time frame that the target of 
the branch and the direction of the branch resolution are known. It has been standard to 
use the known information at this time frame, per example, and write the branch into the 
BTB tabte 61 0. A branch can be a surprise branch 71 0 for one of two reasons: it was not in 
the queue, or it was in the queue but it was not found in time. In the latter case, the 
branch should not ideally be added into the BTB tabte 610 again, as doing so would most 
likely replace some other good entry which is different from the duplicate entry that is to be 
written in to the BTB tafete 61 0. Through the usage t i mo li no for wr i to quouo accoss and 
BTB wr i t i ng t i mo li no 500 in FIG. 5 of a recent entry queue 400 i i n F I G. 1, or a BTB 
rocont entry quouo 620 in FIG. 6, whenever a now data in entry 140 in FIG. 1 is to be 
written dur i ng tho oond dato to quouo i ntorva l 51 0 into the BTB tablo 61 0, it is first 
compared w i th f i rst and second comparo oqua l comparators 430, 431 , ontry i n tho rocont 
ontry quouo doo i o i on b l ock 720 i n F I G. 7 to the branch tag entries 420, 421 within the 
recent entry queue. If it matches ao i nd i catod by an output from OR gato 440 i n F I G. A , 
dur i ng tho chock for dup li cato ontry i ntorval 520 in F I G. 5, one of the entries in the recent 
entry queue 620 then the data i n entry 1 40, or data i n ontry 41 0 i n F I G. A is blocked by 
b l ock wr i to 450 i n F I G. 4 from being written into the BTB tabte 61 0 as it already exists 
somewhere within the BTB tabte 61 0. If the entry is not located within the recent entry 
queue rog i storo 420, 421 , 620 then the entry is written dur i ng tho wr i to BTB wr i to quouo 
i ntorva l 530 into both the BTB tabte 61 0 and trte the recent entry queue 620. The recent 
entry queue SO© works in a first data in 410, rogistors 420— first out rog i stors 421 (FIFO) 
queue structure. When a new entry is placed into the queue, the oldest entry in 
the queue is moved out to make room for the newest entry. Should an entry in the BTB 
tatete 61 0 be required to be invalidated for any reason, the recent entry queue rog i stors 420, 
and rog i stors 421 must be checked to determine if the entry is also contained within it. If 
the entry is in the recent entry queue rog i stors 420, rog i stors 421 and the entry is being 
invalidated in the BTB tabte 610, then the entry must also be invalidated in the recent entry 
queue registers 420 registers 421 
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[0037] F I G. 6 ill ustrates ono oxamp l o of a rocont ontry quouo mod i fy i ng tho BTB tab l o h i t 
dotoct l og i c. I n F I G. 6 b l ock 600 i nc l udos a BTB tab l o array 61 0, a rocont ontry quouo 620 
and a H i t Dotoct Comparo Log i c 630 wh i ch prov i des a h i t dotoct output on li no 610. 
FIG. 6 illustrates an example of a recent entry queue 620 modifying the status of 
the BTB status tab l o array 61 0. Since the recent entry queue 620 is a substantially smaller 
subset of the BTB tab l o array 61 0, the ability to search for branch/target pairs in the recent 
entry queue 620 is significantly faster than searching in the far l argor BTB tab l o array 61 0. 
In the cycle , whon the BTB tetete 61 0 is being accessed for a given row based on the 
search index, the recent entry queue 620 can be compared to the hit detect comparo l og i c 
criteria 630 in parallel. Hence, whenever a new search is started, during the cycle whon a 
read is being performed from the BTB table aeeay 610, the recent entry queue 620 is doing 
a compare w i th tho H i t Dotoct Comparo Log i c 630 on its contents during the same cycle. 
The ability to do a hit detect 640, or search match, a cycle earlier improves the latency 
factor of the branch prediction logic for tight looping branches where the same branch is 
accessed repetitively and the BTB table aeeay 610 by itself is unable to keep up because of 
the initial time required to access the BTB tetete array 61 0. As statod abovo, tho rocont ontry 
quouo 620 ma i nta i ns a dopth up to tho associativ i ty of tho BTB tab l o array 610 whoroby 
wh il o tho BTB i s i ndoxod, tho rocont ontry quouo pos i t i ons aro i nput to compar i son l og i c 
comparo oqua l comparators 430,431 i n F I G. A and h i t dotoct comparo l og i c 630 i n F I G. 6. 
Tho rocont ontry quouo dopth i s ooarchod i n rospoct to a match i ng branch i n para ll e l w i th 
soarch i ng of tho BTB output, whoro tho h i t dotoct comparo l og i c 630 supports tho 
assoc i at i v i ty of tho BTB tab l o array 61 0. I n soarch i ng tho BTB tab l o array 61 0 for tho noxt 
prod i ctod branch, tho soarch otratogy _ uooo a subsot of tho rocont ontry quouo as a subsot of 
tho BTB, tab l o array 610 and proforab l y fast i ndoxos rocont l y oncountorod branches. 
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[0038] F I G. 7 ill ustrates ono oxamp l o of a stato mach i ne doscr i ptor of a rocont ontry 
quouo de l ay i ng decod i ng. In FIG. 7 block 700 illustrates an example of a state machine 
description of a recent entry queue delaying decode As shown, because the BTB 
tetete 610 is working asynchronously from the decoding stage 310 of instructions in the 
microprocessor pipeline 300 of FIG. 3, it is possible for the pipeline to decode , w i th the 
decod i ng stage 31 0, a branch which is predictable, but was not yet found by the BTB tebte 
61 0. In such cases, the branch is deemed as a surprise branch 71 0 and upon operat i on of 
the branch stage resolution stage 350, the execution cycle, of the branch, it will once again 
be placed into the BTB tefete 61 0. In the cases where this missed branch is a loop branch it 
will continue not to be predicted for after each occurrence of not finding it, the branch 
prediction logic will restart based on the surprise branch dec i s i on b l ock 71 0. In the case that 
the branch is already in the BTB tatete 610 as detected by OR gate 440 by the recent entry 
queue 620, recent entry queue decis i on b l ock 720, and the branch is a taken as=a branch 
from the reso l ved I n a dooio i on b l ock 730 where the branch repeatedly occurs with a 
negative offset , i n branch po i nt, dec i s i on block 740, the recent entry queue 620 can be 
used to detect this scenario and cause the decoding 31 0 of m i croprocossor p i po li no 300 
after the branch to be delayed by the de l ay docodo stop 760 until the first prediction via the 
BTB tebte 610 is made such that the predicted branch address can be compared against all 
future decodes m§ of the BTB tab l e 610. By causing a delay i n the de l ay docodo stop 
760 in the decode m§ of the pipeline 300, the next iteration of the branch will be 
predicted by the BTB tebte 61 0 in time. Given that a BTB tatete 61 0 can have higher 
latency on start-up compared to the latency thereof once it is running, the one time delay 
of by the docodo i n norma l fash i on stop 750, or the de l ay docodo b l ock 760 of decode m§ 
can be enough to allow the branch to be predicted for all future iterations of the current 
looping pattern. 
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[0040] As one example, one or more aspects of the present invention can be included in an 
article of manufacture (e.g., one or more computer program products) having, for instance, a 
computer usable media having computer readable code thereon for controlling and 
configuring a computer machine having a pipelined processor and a branch target buffer 
(BTB) creating a recent entry queue in parallel with the branch target buffer (BTB). The 
computer usable media has embodied therein, for instance, computer readable program 
code means for providing and facilitating the capabilities of: the present invention. The 
article of manufacture can be included as a part of a computer system or sold separately. 
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